Sains Malaysiana 53(4)(2024): 907-920
http://doi.org/10.17576/jsm-2024-5304-14
A Remedial Measure of Multicollinearity in Multiple Linear Regression in the Presence of High Leverage Points
(Pemulihan Ukuran Multikolinearan dalam Model Regresi Linear Berganda dengan Kehadiran Titik Terpencil)
SHELAN SAIED ISMAEEL1, HABSHAH MIDI2,* &
KURDISTAN M. TAHER OMAR1
1Department of Mathematics, Faculty of Science,
University of Zakho, Iraq
2Faculty of Science and Institute for
Mathematical Research, Universiti Putra Malaysia,
43400 UPM Serdang, Selangor, Malaysia
Diserahkan: 14 Mac 2023/Diterima: 5 Mac 2024
Abstract
The ordinary least squares (OLS) is the widely used method in multiple
linear regression model due to tradition and its optimal properties.
Nonetheless, in the presence of multicollinearity,
the OLS method is inefficient because the standard errors of its estimates
become inflated. Many methods have been
proposed to remedy this problem that include the Jackknife Ridge Regression (JAK). However, the performance of JAK is poor when multicollinearity and high leverage points (HLPs) which are
outlying observations in the X- direction are present in the data. As a solution to this problem, Robust Jackknife Ridge MM (RJMM) and Robust Jackknife Ridge GM2 (RJGM2) estimators are put forward. Nevertheless, they are still not
very efficient because they suffer from long computational running time, some
elements of biased and do not have bounded influence property. This paper
proposes a robust Jackknife ridge regression that
integrates a generalized M estimator and fast improvised Gt (GM-FIMGT)
estimator, in its establishment. We name this method the robust Jackknife ridge regression based on GM-FIMGT, denoted as
RJFIMGT. The numerical results show that the proposed RJFIMGT method was found
to be the best method as it has the least values of RMSE and bias compared to
other methods in this study.
Keywords: High leverage points; jackknife;
MM-estimator; multicollinearity; ridge regression
Abstrak
Kaedah kuasadua terkecil sering digunakan dalam model linear regresi berganda kerana tradisi dan sifatnya yang optimal. Walau bagaimanapun, dalam kehadiran multikolinearan, kaedah OLS tidak cekap disebabkan penganggar ralat piawai menjadi besar. Banyak kaedah telah dicadangkan bagi mengatasi masalah ini termasuk kaedah Jackknife
Ridge Regression (JAK). Namun, prestasi kaedah JAK sangat lemah dengan kehadiran multikolinearan dan titik tuasan tinggi iaitu cerapan terpencil dalam arah X . Sebagai penyelesaian bagi masalah ini, penganggar Robust Jackknife Ridge MM (RJMM) dan penganggar Jackknife Ridge GM2 (RJGM2) di ketengahkan. Walau bagaimanapun, kaedah ini masih tidak cukup cekap kerana mereka mengambil masa pengiraan yang panjang, mempunyai unsur kepincangan dan tidak mempunyai sifat pengaruh terbatas. Kertas ini mencadangkan kaedah robust
Jackknife ridge regression yang menggabungkan penganggar- M teritlak (GM) dan penganggar pantas terubah suai GT (GM- FIMGT) dalam membangunkannya. Kaedah ini dinamakan robust Jackknife ridge regression berdasarkan GM-FIMGT, ditandakan dengan RJFIMGT. Keputusan berangka menunjukkan bahawa kaedah RJFIMGT yang dicadangkan adalah yang terbaik kerana ia mempunyai nilai RMSE dan pincang terkecil berbanding dengan kaedah lain dalam kajian ini.
Kata kunci: Jackknife; multikolinearan; penganggar MM; regresi ridge; titik tuasan tinggi
RUJUKAN
Alguraibawi, M., Midi, H. & Rana, S. 2015. Robust jackknife ridge regression to
combat multicollinearity and high leverage points. Economics Computation and Economic Cybernatics Studies and Research 49(4): 305-322.
Akdeniz Duran, E.
& Akdeniz, F. 2012. Efficiency of the modified
jackknifed Liu-type estimator. Statistical Papers 53(2): 265-280.
Arskin, R.G. &
Montgomery, D.C. 1980. Augmented robust estimators. Technometrics 22: 333-341.
Bagheri, A. & Midi, H. 2015. Diagnostic plot for the identification of high
leverage collinearity-influential observations. Statistics and Operation Research
Transaction Journal 39: 51-70.
Batah, F.S., Ramanathan, T.V. & Gore, S.D. 2008. The efficiency
of modified jackknife and ridge type regression estimators: A comparison. Surv. Math. Appl. 3: 111-122.
Belsley, D.A., Kuh, E. & Welsch, R.E. 2004. Regression
Diagnostics, Identifying Influential Data and Sources of Collinearity. New
York: John Wiley & Sons Inc.
Brown, P.J. 1977. Centering and scaling in ridge regression. Technometrics 19(1): 35-36.
Dhhan, W., Rana, S. &
Midi, H. 2016. A high breakdown, high efficiency and bounded influence modified
GM estimator based on support vector regression. Journal of Applied Statistics 44(4): 700-714. https://doi.org/10.1080/02664763.2016.1182133
Groβ, J.
2003. Linear Regression (Lecture Notes in Statistics). Verlag Berlin Heidelberg: Springer.
Hinkley, D.V. 1977.
Jackknifing in unbalanced situations. Technometrics 19(3): 285-292.
Hoerl, A.E. &
Kennard, R.W. 1970. Ridge regression: Biased estimation for non-orthogonal
problems. Technometrics 12(1): 55-67.
Huber, P.J. 2004. Robust Statistics. New York: John Wiley &
Sons.
Jadhav, N.H. & Kashid, D.N. 2011. A jackknifed ridge M-estimator for
regression model with multicollinearity and outliers. Journal of Statistical Theory and
Practice 5: 659-673.
Kutner, M.H., Nachtsheim, C.J., Neter,
J. & Li, W. 2005. Applied Linear Regression Models. 5th ed. New
York: McGraw-Hill.
Lawrence, K. & Arthur, J. 1990. Robust
Regression Analysis and Applications. New York: Marcel Dekker Inc. pp.
59-86.
Lim,
H.A. & Midi, H. 2016. Diagnostic robust generalized potential based on
index set equality (DRGP (ISE)) for the identification of high leverage points
in linear model. Computational Statistics 31(3): 859-877.
Li, G. & Chen, Z. 1985. Projection-pursuit approach to robust
dispersion matrices and principal components: Primary theory and Monte Carlo. Journal of the American Statistical
Association 80: 759-766.
Maronna, R.A.,
Martin, R.D. & Yohai, V.J. 2006. Robust
Statistics Theory and Methods. New York: Wiley.
Midi, H. & Zahari, M. 2007. A simulation
study on ridge regression estimators in the presence of outliers and multicollinearity. Jurnal Teknologi 47(C): 59-74.
Midi, H., Hendi, T.H., Arasan, J. & Uraibi, H. 2020. Fast and robust diagnostic technique for
the detection of high leverage points. Pertanika J. Sci.
& Tech. 28(4): 1203-1220.
Midi, H., Ismaeel, S.S., Arasan,
J. & Mohammad, A.M. 2021. Simple and fast generalized - M (GM) estimator
and its application to real data. Sains Malaysiana 50(3): 859-867.
Montgomery, D.C., Peck, E.A. & Viving,
G.G. 2001. Introduction to Linear Regression Analysis. 3rd ed. New York:
John Wiley and Sons.
Penrose, K.W., Nelson, A. & Fisher, A. 1985. Generalized body
composition prediction equation for men using simple measurement techniques. Medicine & Science in Sports &
Exercise 17(2): 189.
Pison, G., Rousseeuw, P.J., Filzmoser, P.
& Croux, C. 2003. Robust factor analysis. Journal
of Multivariate Analysis 84(1): 145-172.
Quenouille, M.H. 1956.
Notes on bias in estimation. Biometrika 43(3-4): 353-360.
Rashid, A.M., Midi, H., Dhnn, W. & Arasan, J. 2021.
An efficient estimation and classification methods for high dimensional data
using robust iteratively reweighted SIMPLS algorithm based on nu-support vector
regression. IEEE Access 9: 45955-45967.
Rousseeuw,
P.J. 1984. Least median of squares regression. Journal of the American
Statistical Association 79: 871-880.
Rousseeuw,
P.J. & Van Driessen, K. 1999. A fast algorithm
for the minimum covariance determinant estimator. Technometrics 41: 212-223.
Shah, I., Sajid, F., Ali, S., Rehman, A., Bahaj, A.A. & Fati, S.M. 2021. On
the performance of jackknife based on estimators for ridge regression. IEEE
Access 9: 68044-68053.
Simpson, D.G., Ruppert, D. & Carroll,
R.J. 1992. On one-step GM estimates and stability of influences in linear
regression. Journal of the American Statistical Association 87: 439-450.
Singh, B., Chaubey, Y.P. & Dwivedi, T.D. 1986. An almost unbiased ridge estimator. The Indian Journal of Statistics 48:
342-346.
Zahariah, S.
& Midi, H. 2023. Minimum regularized covariance determinant and principal
component analysis – based method for the identification of high leverage
points in high dimensional sparse data. Journal
of Applied Statistics 50(13): 2817-2835.
Zahariah, S., Midi, H. & Mustafa, M.S. 2021. An
improvised SIMPLS estimator based on MRCD-PCA weighting function and its
application to real data. Symmetry 13(11): 2211.
*Pengarang untuk surat-menyurat; email: habshah@upm.edu.my
|